Nature Biomedical Engineering
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Nature Biomedical Engineering's content profile, based on 42 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.
Fischer, J.; Spindler, M. P.; Britton, G. J.; Weiler, J.; Tankelevich, M.; Dai, D.; Canales-Herrerias, P.; Jha, D.; Rajpal, U.; Mehandru, S.; Faith, J. J.
Show abstract
Our understanding of human mucosal T cell clonotype distribution in health and disease has centered on immunodominant antigens. We performed single cell T cell receptor (TCR) and RNA sequencing as an untargeted approach to define distributions of T cell clonal groups in health and ulcerative colitis (UC) across 333,088 T cells in colon and peripheral blood. Healthy donor-specific TCR repertoires had limited blood-colon clonal sharing, which was highest in cytotoxic T effector memory (Tem) populations and lowest in regulatory T cells (Tregs), reflecting tissue-based compartmentalization. Within healthy colon, TCR repertoires showed high T cell clonal sharing independent of anatomic distance, associated with high intra-clonal phenotypic diversity. Colon cytotoxic and Th17 populations showed high dispersion across sites, while Tregs were compartmentalized. Clonal lineages dispersed across blood and colon upregulated trafficking markers, suggesting active movement between tissues, while those dispersed across colon sites upregulated residency markers, suggesting intra-colon repertoire sharing is mediated by long-term, slow moving clonal groups. In UC, Tregs were expanded across inflamed sites, and increased CD8 Tem clonal groups showed increased dispersion regardless of inflammation. These findings reveal principles of T cell clonal organization in the human colon during health and disease, identifying opposing patterns of clonal dispersion among Treg and Th17 clonal groups, high phenotypic diversity within dispersed clonal groups, and elevated cross-colon dispersion of CD8 Tem clonotypes in UC.
Nguyen, T. M.; Woods, C.; Liu, J.; Wang, C.; Lin, A.-L.; Cheng, J.
Show abstract
The apolipoprotein E {varepsilon}4 (APOE4) allele is the strongest genetic risk factor for late-onset Alzheimer's disease (AD), the most common form of dementia. APOE4 carriers exhibit cerebrovascular and metabolic dysfunction, structural brain alterations, and gut microbiome changes decades before the onset of clinical symptoms. A better understanding of the early manifestation of these physiological changes is critical for the development of timely AD interventions and risk reduction protocols. Multimodal datasets encompassing a wide range of APOE4- and AD-associated biomarkers provide a valuable opportunity to gain insight into the APOE4 phenotype; however, these datasets often present analytical challenges due to small sample sizes and high heterogeneity. Here, we propose a two-stage multimodal AI model (APOEFormer) that integrates blood metabolites, brain vascular and structural MRI, microbiome profiles, and other clinical and demographic data to predict APOE4 allele status. In the first stage, modality-specific encoders are used to generate initial representations of input data modalities, which are aligned in a shared latent space via self-supervised contrastive learning during pretraining. This objective encourages the learning of informative and consistent representations across modalities by leveraging cross-modality relationships. In the second stage, the pretrained representations are used as inputs to a multimodal transformer that integrates information across modalities to predict a key AD risk genetic variant (APOE4). Across 10 independent experimental runs with different train-validation-test splits, APOEFormer predicts whether an individual carries an APOE4 allele with an average accuracy of 75%, demonstrating robust performance under limited sample sizes. Post hoc perturbation analysis of the predictive model revealed valuable insights into the driving components of the APOE4 phenotype, including key blood biomarkers and brain regions strongly associated with APOE4.
Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.
Show abstract
Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.
Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.
Show abstract
Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.
Oh, J.; Steele, A. G.; Scheffler, M.; Martin, C.; Sheynin, J.; Dietz, V. A.; Valdivia-Padilla, A.; Stampas, A.; Korupolu, R.; Karmonik, C.; Hodics, T. M.; Freyvert, Y.; Manzella, M.; Faraji, A. H.; Horner, P. J.; Sayenko, D. G.
Show abstract
Cervical spinal cord injury (SCI) causes profound and persistent loss of hand function, and effective neuromodulation strategies remain limited. We report the first-in-human implantation of a 32-contact cervical epidural paddle array in two individuals with severe chronic SCI. Individualized motor pool recruitment maps, derived from systematic bipolar and multipolar configurations, enabled person-specific stimulation parameters. Optimized stimulation restored volitional hand opening, closing and coordinated upper-limb movements that were previously unattainable. This approach achieved a >91% success rate in complex reach-grasp-lift-release sequences, supported by substantial gains in range of motion, grip, and pinch strength. Electrophysiological and kinematic analyses demonstrated parameter-dependent, selective recruitment of flexor and extensor motor pools. Personalized stimulation programs integrated with goal-directed activities enabled functional hand use in home and community settings, sustained over several months of continued autonomous use. These findings establish a mechanistically grounded and translational framework for restoring upper-limb function after chronic severe SCI.
Halder, S.; Kim, C. M.; Periwal, V.
Show abstract
Cardiac arrhythmias are abnormal heart rhythms characterized by disordered electrical dynamics that impair cardiac function and pose a major global burden of morbidity and mortality. Early and accurate prediction of arrhythmic anomalies from physiological time series is crucial for effective intervention, yet remains challenging due to the nonlinear, nonstationary, and individualized nature of cardiac dynamics. Despite significant advances in machine learning-based arrhythmia detection, most existing methods operate as static classifiers on electrocardiographic signals and lack online prediction, patient-specific adaptation, and mechanistic interpretability. From a dynamical-systems perspective, arrhythmias represent qualitative regime transitions, often preceded by subtle, temporally extended deviations that are difficult to detect in real time. Here we introduce CASCADE (Chaotic Attractor Sensitivity for Cardiac Anomaly Detection), an online and personalized anomaly forecasting framework built on a special type of reservoir computing called Dynamical Systems Machine Learning (DynML). DynML employs ensembles of continuous-time nonlinear dynamical systems as chaotic reservoirs to reconstruct and forecast short-term cardiac dynamics on a beat-to-beat basis, training only a linear readout. This design enables efficient online adaptation without retraining the underlying dynamical model. Rather than relying on static beat-level classification, CASCADE identifies arrhythmic events as failures of short-term predictability, manifested as statistically significant deviations between predicted and observed dynamics relative to subject-specific baselines. Detection performance is governed by the intrinsic dynamical complexity of the reservoir, quantified by topological entropy. Reservoirs operating near critical entropy regimes optimally amplify subtle, temporally extended irregularities in heartbeat dynamics, rendering incipient arrhythmic signatures linearly separable at the readout level. Topological entropy thus serves both as a predictor of model performance and a principled control parameter for reservoir design. When evaluated on the MIT-BIH Arrhythmia dataset, CASCADE achieved consistently high F1 scores, precision, recall, and overall accuracy across diverse patient populations, demonstrating strong generalizability across clinical and real-world settings. By integrating chaotic reservoir computing, entropy-guided tuning, and online personalized forecasting, CASCADE reframes arrhythmia detection as a problem of dynamical regime transition rather than static classification. This perspective provides a scalable, interpretable, and computationally efficient framework for real-time cardiac monitoring and early-warning clinical decision support.
Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.
Show abstract
As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.
Ma, Z.; Qiao, Y.
Show abstract
Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.
Dey, S. K.; Qureshi, A. I.; Shyu, C.-R.
Show abstract
Target trial emulation (TTE) enables causal inference from observational data but remains bottlenecked by manual, expert-dependent protocol operationalization. While large language models (LLMs) have advanced clinical knowledge extraction and code generation, their ability to automate end-to-end TTE workflows remains largely unexplored. We present an LLM-driven framework using retrieval-augmented generation to extract the five core TTE design parameters from the Carotid Revascularization and Medical Management for Asymptomatic Carotid Stenosis Trial (CREST-2) protocol and generate executable phenotyping pipelines for real-world EHR data. The performance of the framework was evaluated along two dimensions. First, protocol extraction accuracy was assessed against a gold-standard checklist of trial design components using precision, recall, and F1-score metrics. Second, outcome validity was evaluated through population-level concordance analyses comparing EHR-derived outcomes with published trial endpoints using standardized mean difference, observed-to-expected ratios, confidence interval overlap, and two-proportion z-tests. Further, Human-in-the-loop validation assessed the correctness of extracted clinical logic and phenotype definitions. Together, these evaluations demonstrate a structured approach for assessing LLM-driven protocol-to-pipeline translation for scalable real-world evidence generation.
Jahaniani, F.; Schrodi, S. J.; Weinstein, M.
Show abstract
Public RNA-seq sample sets can refine per tumor diagnosis and risk, but heterogeneous biology and analytic drift often obscure structure. Dynamic Quantum Clustering (DQC), an unsupervised geometry-preserving method requiring no clinical labels or preset cluster counts, addresses both challenges. Applied to RNAseq from 692 TCGA gliomas (524 low-grade gliomas (LGG), 168 glioblastomas (GBM); 20,057 protein coding genes), DQC produced two dominant clusters with 90.9% post hoc diagnostic concordance and clear survival time separation. Filtering genes by inter-cluster mean differences yielded a 554 gene subset that improved accuracy to 97.3%. Rank ordering these genes identified ~90 genes that, under DQC, produced three LGG-pure subclusters with ordered, but different survival outcomes and one GBM-rich cluster (PPV 97.1%)--the RNA-based clustering without clinical information thereby inherently reveals molecular groupings which mirror critically important clinical features. Comparing these clusters defined four nonoverlapping gene modules and assigned four BioCoords per tumor. DQC with Biocoords recapitulated the LGG-to-GBM continuum with a mesenchymal/invasion-extracellular matrix axis exhibiting a monotonic survival gradient, illustrating how geometry-aware unsupervised learning can translate bench and computational discovery into meaningful biology-based patient stratification and prognosis.
Feng, Y.; Deng, K.; Guan, Y.
Show abstract
Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.
Undurraga Lucero, J. A.; Chesnaye, M.; Simpson, D.; Laugesen, S.
Show abstract
Objective detection of evoked potentials (EPs) is central to digital diagnostics in hearing assessment and clinical neurophysiology, yet current approaches remain time-intensive and sensitive to inter-individual noise variability. Many existing detection methods rely on population-based assumptions or computationally demanding procedures, limiting robustness and efficiency in real-world clinical settings. We present Fmpi, a digital EP detection framework enabling individualised, real-time response detection through analytical modelling of the spectral colour and temporal dynamics of background noise within each recording. Using extensive simulations and large-scale human electroencephalography datasets spanning brainstem, steady-state, and cortical EPs recorded in adults and infants, we demonstrate performance comparable or superior to state-of-the-art bootstrapped methods while operating at a fraction of the computational cost and maintaining well-controlled sensitivity with improved specificity. Importantly, Fmpi incorporates a futility detection mechanism enabling early termination of uninformative recordings, reducing testing time without compromising diagnostic reliability.
Mille-Fragoso, L. S.; Driscoll, C. L.; Wang, J. N.; Dai, H.; Widatalla, T. M.; Zhang, J. L.; Zhang, X.; Rao, B.; Feng, L.; Hie, B. L.; Gao, X. J.
Show abstract
Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.
Sauer, C. M.; Tovey, N.; Ptasinska, A.; Hughes, D.; Stockton, J.; Zumalave, S.; Rust, A. G.; Lynn, C.; Livellara, V.; Sevrin, F.; Himsworth, C.; Muyas, F.; Nicolaidou, M.; Parry, G.; Paisana, E.; Cascao, R.; Ahmed, S. W.; Yasin, S. A.; Portela, L. R.; Balasubramanian, P.; Burke, G. A. A.; Vedi, A.; Faria, C. C.; Marshall, L. V.; Jacques, T. S.; Hubank, M.; Hargrave, D.; George, S.; Angelini, P.; Anderson, J.; Chesler, L.; Beggs, A. D.; Cortes-Ciriano, I.
Show abstract
Cell-free DNA (cfDNA) profiling enables minimally invasive cancer detection and monitoring. We present SIMMA, a low-input single-molecule sequencing approach that enables multimodal whole-genome and high-depth targeted sequencing of the same cfDNA sample for both tumour-agnostic and tumour-informed liquid biopsy analysis. Across 792 plasma and cerebrospinal fluid cfDNA samples from 277 paediatric patients with diverse brain and extracranial tumours, SIMMA enabled tumour diagnosis, detection of driver mutations, and reconstruction of extrachromosomal DNA (ecDNA) months before clinical relapse. Using conformal prediction trained on genome-wide fragmentomics, genomic and epigenomic data, SIMMA predicts disease burden as a continuous variable and provides well-calibrated uncertainty estimates for each sample, achieving a limit of detection of [~]100 ppm from low-pass whole-genome sequencing data. In summary, SIMMA establishes the clinical utility of multimodal cfDNA profiling with uncertainty quantification for individual patients and unlocks the potential of ecDNA as a liquid biopsy biomarker for disease detection and monitoring across diverse aggressive malignancies.
Yang, Z.; Lyng, G. D.; Batra, S. S.; Tillman, R. E.
Show abstract
Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnoses, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates medical concept extraction as a verification task over medical note concept pairs, coupled with sentence level evidence identification. Built from MIMIC-IV discharge summaries and human verified ICD-10 codes, the dataset is curated through a multi stage large language model (LLM) triage pipeline followed by medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with medical expert assessments. Annotators provide sentence level evidence spans and concise medical rationales. The final dataset contains 823 high quality examples. We define two complementary evaluation tasks: (1) medical concept extraction and (2) sentence level evidence retrieval, enabling assessment of both correctness and interpretability. Benchmarking state-of-the-art LLMs and a supervised baseline reveals that performance remains modest, highlighting the difficulty of extracting implicitly expressed concepts. We further show that explicitly incorporating reasoning cues and prompting to extract implicit evidence substantially improves medical concept extractions, while performance is largely invariant to note length, indicating that MedicalBench isolates reasoning difficulty rather than superficial confounders. MedicalBench provides the first systematic benchmark for implicit, evidence-grounded medical concept extraction, offering a foundation for developing medical language models that can both identify medically relevant concepts and justify their predictions in a transparent and medically faithful manner.
Huang, X.; Hsieh, C.; Nguyen, Q.; Renteria, M. E.; Gharahkhani, P.
Show abstract
Wearable-derived physiological features have been associated with disease risk, but most current studies focus on single conditions, limiting understanding of cross-disease patterns. This study adopts a trans-diagnostic approach to examine whether wearable data capture shared and condition-specific physiological signatures across multiple chronic conditions spanning physical and mental health, and then evaluates the utility of these features for disease classification. A total of 9,301 patients with at least 21 days of consecutive FitBit data from the All of Us Controlled Tier Dataset version 8 were analyzed. Disease subcohorts included cardiovascular disease (CVD), diabetes, obstructive sleep apnea (OSA), major depressive disorder (MDD), anxiety, bipolar disorder, and attention-deficit/ hyperactivity disorder (ADHD), chosen based on prevalence and relevance. Logistic regression and XGBoost models were fitted for each disease subcohort versus the control cohort. We found that compared to using just baseline demographic and lifestyle features, incorporating wearable-derived features enabled improved classification performance in all subcohorts for both models, except for ADHD where improvement was mainly observed for ROC-AUC in logistic regression model likely due to the smaller sample size in ADHD subcohort. The largest performance gains were observed in MDD (increase in ROC-AUC of 0.077 for Logistic regression, 0.071 for XGBoost; p < 0.001) and anxiety (increase in ROC-AUC of 0.077 for logistic regression, 0.108 for XGBoost; p < 0.001). This study provides one of the first comprehensive transdiagnostic evaluations of wearable-derived features for disease classification, highlighting their potential to enhance risk stratification in the real-world setting as a practical complement to clinical assessments and providing a foundation to explore more fine-grained wearable data. Author summaryWearable devices such as fitness trackers and smartwatches are becoming increasingly popular and affordable, providing continuous measurements of heart rate, physical activity, and sleep. Alongside the growing digitization of health records, this creates new opportunities for large-scale, real-world health studies. In this study, we analyzed wearable-derived physiological patterns across a range of chronic conditions spanning both physical and mental health to better understand how these signals relate to disease risk. We found that incorporating wearable-derived heart rate, activity and sleep features improved disease risk classification across several conditions, with particularly strong gains for major depressive disorder and anxiety. By examining how individual features contributed to model predictions, we also identified meaningful associations between physiological signals and disease risk. For example, both duration and day-to-day variation of deep and rapid eye movement (REM) sleep were associated with increased risk in certain conditions. Our study supports the development of real-time, automated tools to assess disease risk alongside clinical care.
Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.
Show abstract
Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.
Kritopoulos, G.; Neofotistos, G.; Barmparis, G. D.; Tsironis, G. P.
Show abstract
Class imbalance in clinical electrocardiogram (ECG) datasets limits the diagnostic sensitivity of automated arrhythmia classifiers, particularly for rare but clinically significant beat types. We propose a three-stage hybrid generative pipeline that combines a spectral-guided conditional Variational Autoencoder (cVAE), a class-conditional latent Denoising Diffusion Probabilistic Model (DDPM), and a Quantum Latent Refinement (QLR) module built on parameterized quantum circuits to augment minority arrhythmia classes in the MIT-BIH Arrhythmia Database. The QLR module applies a bounded residual correction guided by Maximum Mean Discrepancy minimization to align synthetic latent distributions with real class-specific latent banks. A lightweight 1D MobileNetV2 classifier evaluated over five independent random seeds and four augmentation ratios serves as the downstream benchmark. Our findings establish latent diffusion augmentation as an effective strategy for imbalanced ECG classification and motivate further investigation of quantum-classical hybrid methods in cardiac diagnostics.
Ross, J. M.; Forman, L.; Hassan, U.; Gogulski, J.; Truong, J.; Cline, C. C.; Parmigiani, S.; Chen, N.-F.; Hartford, J. W.; Fujioka, T.; Makeig, S.; Pascual-Leone, A.; Keller, C. J.
Show abstract
Neural excitability fluctuates with sensory events, creating windows of opportunity to enhance brain stimulation. Repetitive transcranial magnetic stimulation (TMS), including intermittent theta burst stimulation (iTBS), is a promising treatment for neurological and psychiatric disorders, but does not account for fluctuations in neural excitability, likely contributing to variable outcomes. Sensory Entrained TMS (seTMS) leverages sensorimotor oscillations to enhance corticospinal responses, but the sustained effects as a repetitive protocol are unknown. We extend seTMS to iTBS, measuring motor-evoked potentials (MEPs) as a physiological readout. In a randomized crossover study comparing standard iTBS with sensory entrained iTBS (se-iTBS; n=20), we found that se-iTBS more than doubled the MEP effect (55% vs 26% MEP enhancement) and persisted for at least 30 minutes. Notably, at least 80% of participants showed larger responses with se-iTBS at all time points. se-iTBS may provide a robust and practical framework for optimizing TMS that bridges electrophysiological mechanisms and clinical applications.
Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.
Show abstract
Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.